Size-Dependency of Income Distributions and Its Implications 
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This paper highlights the size-dependency of income distributions, i.e. the income distribution 
versus the population of a country systematically. By using the generalized Lotka-Volterra model 
to fit the empirical income data in the United States during 1996-2007, we found an important 
parameter A can scale with a /3 power of the size (population) of U.S. in that year. We pointed 
out that the size-dependency of the income distributions, which is a very important property but 
seldom addressed by previous studies, has two non-trivial implications: (1) the allometric growth 
pattern, i.e. the power law relationship between population and GDP in different years, which can 
be mathematically derived from the size-dependent income distributions and also supported by the 
empirical data; (2) the connection with the anomalous scaling for the probability density function 
in critical phenomena since the re-scaled form of the income distributions has the exactly same 
mathematical expression for the limit distribution of the sum of many correlated random variables 
asymptotically. 
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I. INTRODUCTION 

The power law distribution of incomes in a nation is 
one of the most important universal patterns found in 
economic systems due to the seminal work of Paretoflj. 
It is suitable for not only incomes and wealth in different 
countries and different years but also other com- 

plex systems e.g. languages^ and complex networksf?']. 
Although this statistical law is supported by many em- 
pirical dataQ and theoretical works [§, it can only 
describe the distribution in high incomes. Some recent 
studies have shown that the distribution for the great 
majority of population can be described by an exponen- 
tial function which is very different from the power law in 
the high incomes |Tl|. Silva and Yakovenko Q defined 
these different income intervals as thermal and super- 
thermal regions whose dynamics may follow very differ- 
ent rules. 

A recent paper of our group discussed how the income 
distribution curves in China change with timefl^. so a 
problem arise that does the distribution curves change 
with the system size? As we know, some early stud- 
ies in family names have pointed out that the distri- 
butions can change with the size of the system [l3l - [l5| . 



This size-dependency of distributions is also found in 
languages In this paper, we try to propose that the 
income distributions also have this size-dependency prop- 
erty which means that the distribution curves change 
with the system size (the population) systematically. In 
Section [TTl we used the revised form of the generalized 
Lotka Volterra model to fit the empirical income data of 
the United States during 1996-2007. In this formula, we 
inserted a scale factor A which changes with the popula- 
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tion as a power law with exponent /3 in different years. 
So the size-dependency of income distribution is implicit 
revealed by this power law relationship. 

In Section lllli we also pointed out that the size- 
dependency of income distributions actually implies the 
power law relationship between population and GDP, 
i.e. the allometric growth(scaling) phenomenon which 
is also found in various complex systems such as ecolog- 
ical systems [l7H20| . cities [2l|-|23j and countries [M [25| . 
And we have also tested this relationship by the em- 
pirical data. Some studies in family names fl3| and 
languages [2^ H^l have linked the patterns of power law 
distributions and power law relationships of two vari- 
ables. However, in this study, we argued that the ex- 
ponent of the power law relationship between population 
and GDP doesn't depend on the Pareto exponent of in- 
come distribution but the size-dependency exponent. 

Furthermore, the re-scaled form of income distribu- 
tion curves can be re-expressed as a generalized math- 
ematical form in Section HV] This formula actually has 
been found to describe the anomalous scaling of prob- 
ability density function in critical phenomena, e.g. spin 
systems [2^ [2^, where the re-scaled distribution form can 
be treated as a limit distribution of the sum of a large 
number of correlated random variables ^8, 29]. There- 
fore, the size-dependency of income distribution also im- 
plies the connection with the central limit theorem of 
correlated variables. 



II. SIZE-DEPENDENT INCOME 
DISTRIBUTIONS 

The personal-income distribution data in the United 
States during 1996-2007 is available. This data is com- 
piled by the Internal Revenue Service (IRS) from the tax 
returns in the USA for the period 1996-2007(presently 
the latest available year [S^). The original data gives 
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the percentage in given income intervals. We can plot 
the cumulative distributions in Figure [TJ Notice that, the 
income data here is just the nominal income so the infla- 
tion ingredients are not excluded. Therefore, the GDP 
data we will use in the following parts is also nominal or 
unadjusted for the effects of inflation. 

As pointed out by the distribution curves exhibit 
exponential form in low incomes and power law distri- 
bution in high incomes. However, we use the general- 
ized Lotka-Volterra modelj^, [3l[ instead of the method 
in Q to fit these data since the generalized LV model 
only needs two free parameters. We assume the density 
curves of income distributions in different years follow 
the equation. 



fix) = A 



(g-l)" exp(-^) 
r(a) (Ax)i+" 



(1) 



where, a and A are parameters needed to be estimated. 
r() is the Euler gamma function. Note that, in the origi- 
nal form of generalized- LV model[3l|, there is no factor A 
since the main purpose of that paper is to give an expla- 
nation of the shape of the income distribution. However, 
we must insert this factor in Equation [I] because we care 
not only the shape of the income distribution curves but 
also its dependency on size (the population of a country) 
in different years. And this size-dependency property can 
only be reflected by A. In addition, a is the Pareto's ex- 
ponent in high incomes regime because Equation [1] has 
a truncated power law form. Nevertheless, we will use 
the cumulative distribution function instead of Equation 
[T] directly to reduce the estimated errors. 
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where the function C{x) is the probability that a person 
whose income is larger than x. Two steps fitting method 
is used in this paper. At first, we apply Equation [5] to the 
empirical data year by year and obtain the best estima- 
tions of parameters a and A. Here, the "best" means the 
total distances between empirical data and theoretical 
curves on the log- log coordinate are minimized. The as 
derived by the first step are (1.59043, 1.60112, 1.61717, 
1.67562, 1.75147, 1.76187, 1.71051, 1.61152, 1.80999, 
1.95683, 1.87374, 1.92885), they fluctuate around the 
mean value 1.74076. Second, we fix a = 1.74076 and use 
the same method to obtain the best estimation of A again. 
We will show that the size-dependency and implications 
actually are independent on a. The reason of using this 
two steps fitting method is to get a better estimation of A 
which is more important than a. From Figure [1] we can 
see that the distributions change over time regularly. As 
time goes by, the distribution curves shift. This trend is 
more obvious in the scaling regions (high income tails). 
The relationship between the best estimations of A and 
years is shown in the legend of Figure [TjFurthermore, we 
know that the population of U.S. increases with time in 
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FIG. 1. Income Distributions of U.S. in the period of 1996- 
2007. 
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FIG. 2. The Power Law Relationship between the Population 
and A during 1996-2007 



these years. Therefore, A actually is the function of pop- 
ulation at a given year. This functional relationship can 
be presented by a power law relationship between pop- 
ulation and A as Figure [2] shows. From this figure, we 
observe an apparent trend that A decreases with popu- 
lation. This trend can be approximated by a power law 
relationship between A and population. 



(3) 



where /3 estimated as 4.365 is the slope of the line in Fig- 
ure [H Therefore, we conclude that the income distribu- 
tions are size(population) dependent. This dependency 
is described by the power law relationship between the 
scale parameter A and the population. 

As a result, the income distributions in different years 
can be re-scaled by P~^, 
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The re-scaled curve of income distribution is shown in 
the inset of Figure [5J 

Although the power law relationship equation [3] is ac- 
ceptable because its is big enough, we know there are 
still large deviations from the empirical data in Figure 
[2j We guess the main errors are from the income dis- 
tribution fittings. In Figure [1] we know that there are 
some deviations in the theoretical income distributions 
from the empirical data. And these errors can of course 
influence the estimations of As very dramatically since As 
are very small. The second reason is we have very few 
samples here(only 11 years), so the noise in the original 
data can not be eliminated. Thus, equation |3] is just an 
approximation, however it can not prevent us to get an 
asymptotic theory. Next, we will discuss the two impor- 
tant implications of this size-dependency. 
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FIG. 3. Power Law Relationship between the Population and 
the GDP of U.S. during 1996-2007 



III. POWER LAW RELATIONSHIP BETWEEN 
POPULATION AND GDP 

We will show that the size-dependency of income dis- 
tributions implies the power law relationship between 
population and GDP. At first, we know that the GDP 
of a country is proportional to the total incomes of all 
people Second, the total incomes can be read from 
the income distribution curve. We can write down the 
equation, 

X-P{I), (5) 

where, X is the GDP, P is the population of a given 
year. / is the random variable income in a given year. 
And (/) stands for the ensemble mean value of incomes. 
Therefore, P{I) is just the total incomes of the whole 
country in the given year. 

Then we can calculate the mean income from the cu- 
mulative probability function (Equation [5]) as follow, 

(/) = / xf{x)dx = / C{x)dx = -. (6) 
Jo Jo 

Therefore, 

X - P/X. (7) 

Substituting Equation |3] into [7l we get: 

X - Pi+''. (8) 

Equation [8] is just the power law relationship (allometric 
growth) between population and GDP with an exponent 
1 + (3. We have estimated the exponent (3 ~ 4.365 from 
the income distributions. Therefore, we predict that the 
GDP is a 5.365 power of the population in the United 
States during 1996-2007. 

On the other hand, we can obtain the real data of the 
population and the GDP of the United States during the 
given period. The two variables have a power law rela- 



tionship which is shown in Figure |31 From the empirical 
data, we can estimate the power law exponent is about 
5.065 which is closed to the exponent we have predicted 
from size-dependent income distribution data (the rela- 
tive error is |5.365 — 5.065|/5.065 w 6%). However, there 
is still a little deviation between the empirical exponent 
and predicted one. The possible error sources may in- 
clude (1) the income curves; (2) the estimation of f3; (3) 
the deviation between GDP and total incomes. 



IV. GENERALIZED SIZE-DEPENDENT 
INCOME DISTRIBUTION 

One of an interesting fact which deserves more atten- 
tion is the size-dependency of income distribution and its 
implication of the power law relationship between popu- 
lation and GDP are independent on the Pareto exponent 
a in the income distribution formula Equation [T] There- 
fore, we can further hypothesize that the size-dependency 
of distribution is a unique property independent on the 
concrete form of the density function. 

Actually, from Equation 21 we can generalize an ab- 
stract form of the probability density function, 

fix) ~ p-^g{P~Px). (9) 

Where, g{y) is an arbitrary probability density function 
with size-independent argument y. We know that when 

we set g{y) as the concrete form, ^'r(a) ' 
get the generalized LV model in Equation HI 

Actually, the power law relationship between popula- 
tion and GDP which is discussed in Section IIIII can be 
derived from the abstract form (Equation [9]) because, 

(x) = / xf{x)dx ~ / xP-^giP-^x)dx, (10) 
Jo Jo 
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replace the integral variable x with y = P ^x, we obtain, 

r+oo /•+00 

{x) ^ / p-^pf'giy)pf^dy = P^ / giy)dy ^ 
Jo Jo 

(11) 

where, g[y)dy is a constant because is size- 

independent. Finally, we can also obtain the power law 
relationship between population and GDP, 

X = P{x) - pi+^. (12) 

So, we can conclude that the essence of size-dependency 
in income distribution is captured by Equation [9l Actu- 
ally, this re-scaled form distribution is not first discovered 
by this paper. In [28', '29i], the authors also gave a simi- 
lar formula to describe the anomalous scaling probability 
density function in critical phenomena, 

f{x) ^ n-^g{n-^x). (13) 

Where, n is the size of the system (the number of ad- 
dends), D is also a re-scaled exponent. The same mathe- 
matical form must imply the ubiquitous natural laws, so 
the individual income can be viewed as a sum of many 
correlated random variables related to each person in the 
same country. However, we will not discuss the detail of 
this discovery and leave it to the future studies because 
of the size limitation of this paper. 



ignored more or less by previous studies. The size- 
dependency has two important implications: 1. The 
power law relationship between population and GDP 
(which is also known as allometric growth); 2. The re- 
scaled income distribution has the same mathematical 
form for the anomalous scaling probability density func- 
tion of the sum of many correlated random variables. 
However, due to the limitation of our data, the results 
discussed in this paper are only for United States of 
America, this particular developed country, and only for 
the period of 1996-2007 which is a very stable time of 
the United States. We have observed that the allometric 
growth pattern is not found for some countries, espe- 
cially the nations encountering convulsions or inflation 
by another data set. Thus, we hypothesize that the size- 
dependency of income distribution, especially the power 
law relationship between A and population will not be 
observed as well in these cases. 



In addition, we have found the same size-dependency 
phenomena in human online behaviors (33| , therefore, it 
is reasonable to accept that some results in this paper as 
common ones for the stable developing complex systems. 
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